Introduction

Column

Abstract

Catching Some Z’s: An Analysis of Factors of a Good Night’s Sleep

Abstract

Research Questions

  • Do people who exercise generally have higher quality sleep?

  • Is there any correlation between bedtime and the amount of deep sleep someone gets?

  • Do smoking, alcohol, and caffeine negatively impact the amount of deep sleep we get in a night or just sleep in general?

  • Do age or gender have an impact on the amount and quality of sleep?

Column

Background and Signifigance

Signifigance

Sleep is something that impacts almost all aspects of our daily lives. It gives us energy throughout the day, and sleep is also allows us to recharge mentally and physically. However, it often seems like we can’t get enough of it. This study will help to give us some answers as to what we can do to get more and better quality sleep. Sleep is a very complex process with multiple moving parts, and this process will help to shine a light on what factors affect our sleep, and how we can improve our sleep.

Background

Sleep is primarily composed of 3 broad stages: light sleep, deep sleep, and REM sleep. As you sleep, you cycle through these stages throughout the night, with each light sleep stage growing longer and deep and REM stages growing shorter throughout the night. According to sleepfoundation.org, the functions of each sleep stage are as follows:

  • Light Sleep serves as the transition between wakefulness and REM Sleep to Deep Sleep
  • Deep Sleep is responsible for repairing and restorative processes. During this phase, the body repairs itself and the mind has a chance to rest and consolidate memories from the day
  • REM Sleep is the transition between Deep Sleep and Light Sleep and wakefulness. Dreams occur during this phase of sleep.

From these descriptions, it is easy to see that deep sleep is one of the most important stage of sleep, since it is directly responsible to how effective our sleep is and for how rested we actually feel in the morning. As a result, I will focus on factors that specifically impact how much deep sleep we typically get in a night.

Data Source

The data used for this study was collected by students from ENSIAS National School for Computer Science in Morocco. Primarily, it was collected through a collection self-reported surveys, actigraphy (monitoring of sleep and activity cycles), and polysomnography (recording of vitals and brain activity during sleep).

Methods

For this study, I analyze the data by looking at relationships between different variables and how they influence the amount of deep sleep a person gets in a night. For this study , I primarily decided to use deep sleep percentage and the number of awakenings in a given night as a measure of quality of sleep, and I also examine how some of these factors also effect the total amount of sleep in a given night.. The higher the deep sleep percentage and the less awakenings, the higher quality the sleep is. Specifically, I examine relationships between variables that I believe could have some tangible effect on our quality of sleep, such as:

  • Exercise Frequency
  • Bedtime
  • Smoking Status
  • Alcohol Consumption
  • Caffeine Consumption
  • Age
  • Gender

Exercise Frequency, Bedtime, Alcohol Consumption, Caffeine Consumption, Awakenings, and Age are all classified as numerical variables initially. However, upon working with the data set, I found that that it would be more appropriate to reclassify these variables as factors, since there are relatively few unique values between observations. I also found that the results from the analysis of age and bedtime in relation to deep sleep percentage and awakenings was easier to work with and interpret when put into age and time groups.

Throughout the analysis, I use a combination of box plots separated by group to see relationships between categorical variable and awakening and to see differences between each group of the categorical variables. In addition, I also use box plots and violin plots separated by group to see the impact that each variable has on the percentage of deep sleep that people get in a given night. Additionally, throughout the data set there are some observations that are missing some of the values I am interested in. However, they make up a very small proportion of the data. As a result, I decided to replace all missing values with the mode of the relevant category.

Data at a Glance

Column {data-width = 650}

Data

Column {data-width = 350}

Variable Explanations

Variables

  • ID = a unique identifier for each test subject
  • Age = age of subject
  • Gender = male / female
  • Bedtime = Year-Month-Date-Time
  • Wakeup time = Year-Month-Date-Time
  • Sleep duration = Amount of time between bedtime and wakeup time
  • Sleep efficiency = proportion of time in bed vs time asleep
  • REM sleep percentage = percentage of time spent in REM sleep
  • Deep Sleep Percentage = percentage of time spent in deep sleep
  • Light Sleep Percentage = percentage of time spent in light sleep
  • Awakenings = # of times subject woke up during the night
  • Caffeine Consumption = amount of caffeine consumed during past 24 hrs before bedtime (mg)
  • Alcohol Consumption = amount of alcohol consumed 24 hours before bedtime (oz)
  • Smoking status = Yes/No
  • Exercise Frequency = # of times the subject exercises per week
  • Hours Asleep = Amount of time actually spent asleep
    • Calculated from Sleep duration * Sleep efficiency

Exercise

Column

Awakenings

Deep Sleep Percentage

Column

Analysis

Analysis of Awakenings and Exercise Frequency

From the percentage bar chart of Awakenings by Exercise Frequency, we can see a few things that immediately jump out. From the study, people who exercise 5 times a week are much more likely than other groups to have 0 Awakenings through the night, and if they do wake up, it will only be once. People who exercise 4 times a week also are much more likely than people who exercise less to only wake up 0-1 times during a given night. Additionally, as people exercise more frequently, they are much less likely to wake up 4 times during a given night. As a broad, overall trend, we can see that the more people exercise, the less likely they are to wake up multiple times during the night. From this, it is fair to say, based on my earlier criteria for high quality sleep, that exercising more frequently can increase sleep quality.

Analysis of Deep Sleep Percentage and Exercise Frequency

From the box plots of deep sleep percentage by Exercise Frequency, we can see a few trends. First, the median deep sleep percentage of all exercise frequency groups hovers between 56% - 60%. Additionally, we can see that the distribution of deep sleep percentage is heavily left skewed for people who exercise 0-1 times per week, and it is slightly right skewed for people who exercise 2 - 5 times per week. The distribution of people who exercise 1 time a week has the highest spread, with people who don’t exercise at all falling closely behind. People who exercise 5 times a week have the tightest spread, and people who exercise 4 times a week is the only category that has the highest median value with relatively low spread and few medians. From this, we can see that people who exercise typically have consistently better sleep quality than those who exercise 0-1 times a week.

Bedtime

Column

Awakenings

Deep Sleep Percentage

Column

Analysis

Analysis

Smoking

Column

Awakenings

Deep Sleep Percentage

Hours Asleep

Column

Analysis

Analysis of Awakenings and Smoking Status

When looking at the conditional distribution of Awakenings by Smoking Status, one thing becomes very clear. Primarily, both smokers and non-smokers share incredibly similar distributions of awakenings. Nonsmokers are marginally more likely to have 0 or 4 awakenings than smokers, and smokers are more likely to wake up 1-3 times in a given night than nonsmokers. However, the differences between the distribution smoker and nonsmoker awakenings are so marginal that they are not statistically significant. As a result, we can say that smoking doesn’t seem to have a strong influence on whether someone wakes up more often or not.

Analysis of Deep Sleep Percentage and Smoking Status

There are some key differences between the distribution of deep sleep percentage between smokers and nonsmokers based off of the box plots. People who do not smoke have a slightly higher median deep sleep percentage than smokers. Additionally, the nonsmoking group has the highest overall observation of deep sleep percentage. The distribution of smoker deep sleep percentage is heavily left skewed with very high spread, while the distribution of nonsmoker deep sleep percentage is slightly right skewed, with a much tighter spread. From this, we can conclude that, while the median percentage of deep sleep is comparable between the two categories, nonsmokers typically and more consistently have a higher deep sleep percentage than smokers do. As a result, we can say that there seems to be a relationship between smoking and poor sleep quality.

Analysis of Total Hours Asleep and Smoking Status

Once again, there are some key differences between the distribution of hours asleep for smokers and nonsmokers. While both distributions are slightly left skewed, it is important to note that nonsmokers have a higher median amount sleep in a night when compared to smokers by about 1 hour. Additionally, the distribution of hours asleep for smokers has a much higher spread than nonsmokers. As a result, it is fair to say that nonsmokers typically get more sleep in a given night than smokers, and the amount of sleep that nonsmokers get is much more consistent than that which smokers get.

Alcohol

Column

Awakenings

Deep Sleep Percentage

Hours Asleep

Column

Analysis

Analysis of Awakenings and Alcohol Consumption

When looking at the conditional distribution of Awakenings by Alcohol Consumption, we can notice a few general trends. First, people who have 0 oz 24 hours before bedtime are much more likely to have 0-1 awakenings in a night than to wake up 2 or more times. People who only have 1 oz follow a similar trend, but they are more likely to wake up 1 time than people who have no alcohol. From there, people who drink 2 os or more are much more likely to wake up 2 or more times, and people who drink 4-5 oz of alcohol are much more likely than other groups to wake up 4 times in a given night. From this, we can see a general trend. The more alcohol someone drinks, the more likely they are to wake up in the night and, the more likely they are to wake up multiple times.

Analysis of Deep Sleep Percentage and Alcohol Consumption

After examining the box plots that show the distribution of Deep Sleep Percentage by Alcohol Consumption, we can see a few crucial things. People who drink no alcohol have the second highest median percentage of deep sleep, and the distribution of deep sleep percentage for this group has the smallest spread. People who drink 1 oz of alcohol have the highest median percentage of all categories and the second smallest spread of all groups. However, the distribution for people who drink 1 oz is heavily left skewed. For people who drink 2 - 5 oz of alcohol in a day, the median value quickly drops off for people who only drink 0-1 oz of alcohol. Additionally, the distribution for people who drink 2 or more oz also has a very large spread. From this, we can see the trend that an individual’s deep sleep percentage is greatly and negatively affected by drinking more than 1 oz of alcohol. Not only are they more likely to get less deep sleep in a given night, but they also are very inconsistent in the amount of deep sleep they get.

Analysis of Total Hours Asleep and Alcohol Consumption

Caffeine

Column

Awakenings

Deep Sleep Percentage

Hours Asleep

Column

Analysis

Analysis

Biological Factors

Column

Age Awakenings

Age Deep Sleep

Age Hours Asleep

Gender Awakenings

Gender Deep Sleep

Gender Hours Asleep

Column

Age Analysis

Analysis 1

Gender Analysis

Analysis 2

Conclusions and Discussion

Column

Conclusion

Summary

Column

Limitations and Future Study

One major limitation of this study is that sleep is influenced by a lot of different and complex components that can intertwine and interact with each other. Attributing sleep quality to the percentage of deep sleep and the number of awakenings is a very large simplification that excludes many other potential factors.

Another limitation of the study is that many of the variables, such as Caffeine Consumption and Alcohol Consumption, have a vary narrow range of given values. As a result, in this study I was forced to examine these variables as categorical variables, where in reality if I had more data I would have treated them as quantitative variables. This limited the analysis that I could perform in this study, and this can potentially hide relationships between variables.

For future studies, I would include more variables that could potentially affect sleep, such as diet or screen time before bed in order to gain a more accurate sense of sleep quality factors. Additionally, the data that I would collect would be more in depth and have a greater breadth of values, so that in the future I can more easily treat these variables as quantitative data.

About the Author

I am Bryan Kohler, a current junior undergraduate student at the University of Dayton. I am pursuing a Bachelors Degree in Computer Science with a minor in Data Analytics, and I expect to graduate in May 2024.

https://www.linkedin.com/in/bryan-kohler/

---
title: "Catching Some Z's"
output: 
  flexdashboard::flex_dashboard:
    theme:
        version: 4
        bootswatch: journal
        primary: "#042975"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
```

<style>
.chart-title {  /* chart_title  */
   font-size: 20px;
  }
body{ /* Normal  */
      font-size: 16px;
  }
</style>

Introduction
===


Column {data-width=450}
-----------------------------------------------------------------------

### Abstract 

<font size = 4>**Catching Some Z's: An Analysis of Factors of a Good Night's Sleep**</font>

Abstract

### Research Questions 

  - Do people who exercise generally have higher quality sleep?
  
  - Is there any correlation between bedtime and the amount of deep sleep someone gets?
  
  - Do smoking, alcohol, and caffeine negatively impact the amount of deep sleep we get in a night or just sleep in general?
  
  - Do age or gender have an impact on the amount and quality of sleep?

Column {.tabset data-width=550}
-----------------------------------------------------------------------

### Background and Signifigance

**Signifigance**

  Sleep is something that impacts almost all aspects of our daily lives. It gives us energy throughout the day, and sleep is also allows us to recharge mentally and physically. However, it often seems like we can't get enough of it. This study will help to give us some answers as to what we can do to get more and better quality sleep. Sleep is a very complex process with multiple moving parts, and this process will help to shine a light on what factors affect our sleep, and how we can improve our sleep.  

**Background**

  Sleep is primarily composed of 3 broad stages: light sleep, deep sleep, and REM sleep. As you sleep, you cycle through these stages throughout the night, with each light sleep stage growing longer and deep and REM stages growing shorter throughout the night. According to sleepfoundation.org, the functions of each sleep stage are as follows:
  
  - Light Sleep serves as the transition between wakefulness and REM Sleep to Deep Sleep
  - Deep Sleep is responsible for repairing and restorative processes. During this phase, the body repairs itself and the mind has a chance to rest and consolidate memories from the day
  - REM Sleep is the transition between Deep Sleep and Light Sleep and wakefulness. Dreams occur during this phase of sleep. 
  
From these descriptions, it is easy to see that deep sleep is one of the most important stage of sleep, since it is directly responsible to how effective our sleep is and for how rested we actually feel in the morning. As a result, I will focus on factors that specifically impact how much deep sleep we typically get in a night.

**Data Source**

  The data used for this study was collected by students from ENSIAS National School for Computer Science in Morocco. Primarily, it was collected through a collection self-reported surveys, actigraphy (monitoring of sleep and activity cycles), and polysomnography (recording of vitals and brain activity during sleep). 

### Methods

For this study, I analyze the data by looking at relationships between different variables and how they influence the amount of deep sleep a person gets in a night. For this study , I primarily decided to use deep sleep percentage and the number of awakenings in a given night as a measure of quality of sleep, and I also examine how some of these factors also effect the total amount of sleep in a given night.. The higher the deep sleep percentage and the less awakenings, the higher quality the sleep is. Specifically, I examine relationships between variables that I believe could have some tangible effect on our quality of sleep, such as: 

  - Exercise Frequency
  - Bedtime
  - Smoking Status 
  - Alcohol Consumption 
  - Caffeine Consumption
  - Age 
  - Gender 
  
Exercise Frequency, Bedtime, Alcohol Consumption, Caffeine Consumption, Awakenings, and Age are all classified as numerical variables initially. However, upon working with the data set, I found that that it would be more appropriate to reclassify these variables as factors, since there are relatively few unique values between observations. I also found that the results from the analysis of age and bedtime in relation to deep sleep percentage and awakenings was easier to work with and interpret when put into age and time groups. 

Throughout the analysis, I use a combination of box plots separated by group to see relationships between categorical variable and awakening and to see differences between each group of the categorical variables. In addition, I also use box plots and violin plots separated by group to see the impact that each variable has on the percentage of deep sleep that people get in a given night. Additionally, throughout the data set there are some observations that are missing some of the values I am interested in. However, they make up a very small proportion of the data. As a result, I decided to replace all missing values with the mode of the relevant category. 

Data at a Glance
===

```{r dataSetup}
pacman::p_load(DT, knitr, tidyverse, plotly, conflicted)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")

sleep <- read_csv("Sleep_Efficiency.csv", col_types = "ddfTTdddddfffff")


colnames(sleep) <- make.names(colnames(sleep))

sleep$Bedtime <- format(sleep$Bedtime, "%Y-%m-%d %H:%M:%S")
sleep$Wakeup.time <- format(sleep$Wakeup.time, "%Y-%m-%d %H:%M:%S")

sleep <- sleep %>%
  mutate(Hours.asleep = Sleep.duration * Sleep.efficiency)
    #Deep.sleep.hours = Sleep.duration * Sleep.efficiency * Deep.sleep.percentage / 100, 
        # REM.sleep.hours = Sleep.duration * Sleep.efficiency * REM.sleep.percentage / 100,
        # Light.sleep.hours = Sleep.duration * Sleep.efficiency * Light.sleep.percentage / 100,
        

mode <- function(df, x) {
  x_sym <- sym(x)          # Converts the string x to a symbol.
  value <- df %>%    
    count(!!x_sym) %>%     # Uses the unquote-splice operator !! to ensure that x_sym is treated as a column name.  
    arrange(desc(n)) %>%
    slice(1) %>%           # Selects the first row and extracts the value of the mode.  
    pull(!!x_sym)
  return(value)
}

#Remove Missing values from Awakenings and replace them with the mode

temp_mode <- mode(sleep, "Awakenings")
sleep$Awakenings[is.na(sleep$Awakenings)] <- temp_mode

#Remove missing values from Caffeine Consumption and replace them with the mode

temp_mode <- mode(sleep, "Caffeine.consumption")
sleep$Caffeine.consumption[is.na(sleep$Caffeine.consumption)] <- temp_mode

#Remove missing values from Alcohol Consumption and replace them with the mode

temp_mode <- mode(sleep, "Alcohol.consumption")
sleep$Alcohol.consumption[is.na(sleep$Alcohol.consumption)] <- temp_mode

#Remove missing values from Exercise Frequency and replace them with the mode

temp_mode <- mode(sleep, "Exercise.frequency")
sleep$Exercise.frequency[is.na(sleep$Exercise.frequency)] <- temp_mode

### Formatting for pop up text boxes

font <- list(
  family = "Arial",
  size = 15,
  color = "white"
)

label <- list(
  bgcolor = "#707372", 
  bordercolor = "transparent",
  font = font
)
```


Column {data-width = 650}
-----------------------------------------------------------------------

### Data 

```{r show_table}

datatable(sleep, rownames = FALSE, colnames = (c("ID", "Age", "Gender", "Bedtime", "Wakeup Time", "Sleep Duration", "Sleep Efficiency", "REM Sleep Percentage", "Deep Sleep Percentage", "Light Sleep Percentage", "Awakenings", "Caffeine Consumption (mg)", "Alcohol Consumption (oz)", "Smoking Status", "Exercise Frequency", "Hours Asleep")),
          options = list(columnDefs = list(list(className = 'dt-center', targets = 1:5)), pageLength = 20))


```

Column {data-width = 350}
-----------------------------------------------------------------------

### Variable Explanations

**Variables**

  - ID = a unique identifier for each test subject
  - Age = age of subject
  - Gender = male / female
  - Bedtime = Year-Month-Date-Time
  - Wakeup time = Year-Month-Date-Time
  - Sleep duration = Amount of time between bedtime and wakeup time
  - Sleep efficiency = proportion of time in bed vs time asleep
  - REM sleep percentage = percentage of time spent in REM sleep
  - Deep Sleep Percentage = percentage of time spent in deep sleep
  - Light Sleep Percentage = percentage of time spent in light sleep
  - Awakenings = # of times subject woke up during the night
  - Caffeine Consumption = amount of caffeine consumed during past 24 hrs before bedtime (mg)
  - Alcohol Consumption = amount of alcohol consumed 24 hours before bedtime (oz)
  - Smoking status = Yes/No
  - Exercise Frequency = # of times the subject exercises per week
  - Hours Asleep = Amount of time actually spent asleep 
      - Calculated from Sleep duration * Sleep efficiency

Exercise
===

Column {.tabset data-width=550}
-----------------------------------------------------------------------

### Awakenings

```{r ExerciseAwakenings}

sleep$Exercise.frequency <- factor(sleep$Exercise.frequency, levels = c("0.0","1.0","2.0","3.0","4.0","5.0"))
sleep$Awakenings <- factor(sleep$Awakenings, levels = c("0.0","1.0","2.0","3.0","4.0","5.0"))

ggplot(sleep, aes(x = Exercise.frequency, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Exercise Frequency") +
  ylab("Percentage of Exercise Frequency") + 
  ggtitle("Distribution of Awakenings by Exercise Frequency") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") +
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

### Deep Sleep Percentage 

```{r ExerciseSleep}

ggplot(sleep, aes(x = Exercise.frequency, y = Deep.sleep.percentage)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Exercise Frequency") + 
  ylab("Deep Sleep Percentage ") + 
  ggtitle("Distribution of Deep Sleep Percentage by Exercise Frequency") + 
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

Column {data-width=450}
----------------------

### Analysis
 
**Analysis of Awakenings and Exercise Frequency**

From the percentage bar chart of Awakenings by Exercise Frequency, we can see a few things that immediately jump out. From the study, people who exercise 5 times a week are much more likely than other groups to have 0 Awakenings through the night, and if they do wake up, it will only be once. People who exercise 4 times a week also are much more likely than people who exercise less to only wake up 0-1 times during a given night. Additionally, as people exercise more frequently, they are much less likely to wake up 4 times during a given night. As a broad, overall trend, we can see that the more people exercise, the less likely they are to wake up multiple times during the night. From this, it is fair to say, based on my earlier criteria for high quality sleep, that exercising more frequently can increase sleep quality. 

**Analysis of Deep Sleep Percentage and Exercise Frequency**

From the box plots of deep sleep percentage by Exercise Frequency, we can see a few trends. First, the median deep sleep percentage of all exercise frequency groups hovers between 56% - 60%. Additionally, we can see that the distribution of deep sleep percentage is heavily left skewed for people who exercise 0-1 times per week, and it is slightly right skewed for people who exercise 2 - 5 times per week. The distribution of people who exercise 1 time a week has the highest spread, with people who don't exercise at all falling closely behind. People who exercise 5 times a week have the tightest spread, and people who exercise 4 times a week is the only category that has the highest median value with relatively low spread and few medians. From this, we can see that people who exercise typically have consistently better sleep quality than those who exercise 0-1 times a week. 

Bedtime
===

Column {.tabset data-width=550}
-------------------------------

### Awakenings

```{r BedtimeAwakenings}
# Process the 'Bedtime' column
sleep$Hour <- hour(sleep$Bedtime)

# Function to categorize hours
categorize_hour <- function(hour) {
  if (hour >= 21) {
    return(paste(hour, "-", hour + 1, ":00", sep=""))
  } else if (hour <= 3) {
    return(paste(hour, "-", hour + 1, ":00", sep=""))
  } else {
    return(NA)
  }
}


sleep$Time_Category <- sapply(sleep$Hour, categorize_hour)

sleep$Time_Category <- factor(sleep$Time_Category, levels = c("21-22:00", "22-23:00", "23-24:00","0-1:00", "1-2:00", "2-3:00"))

ggplot(sleep, aes(x = Time_Category, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Bedtime (Military Time)") +
  ylab("Percentage of Bedtime") + 
  ggtitle("Distribution of Awakenings by Bedtime") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") + 
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

### Deep Sleep Percentage

```{r BedtimeSleep}

p <- plot_ly(data = sleep, 
             x = ~Time_Category, 
             y = ~Deep.sleep.percentage, 
             type = 'violin', 
             fillcolor = 'turquoise', 
             box = list(visible = T, width = 0.1, 
                        line = list(color = 'black'))) %>%
      plotly::layout(
        title = "Distribution of Deep Sleep Percentage by Bedtime",
        xaxis = list(title = "Bedtime (Military Time)"),
        yaxis = list(title = "Deep Sleep Percentage"),
        violinmode = "overlay"
      )
p %>% 
  style(hoverlabel = label)

```

Column {data-width=450}
----------------------

### Analysis

Analysis

Smoking
===

Column {.tabset data-width=550}
-------------------------------

### Awakenings

```{r smokingAwakenings}
ggplot(sleep, aes(x = Smoking.status, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Smoking Status") +
  ylab("Percentage of Smoking Status") + 
  ggtitle("Distribution of Awakenings by Smoking Status") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") + 
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

### Deep Sleep Percentage

```{r smokingDeepSleep}
ggplot(sleep, aes(x = Smoking.status, y = Deep.sleep.percentage)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Smoking Status") + 
  ylab("Deep Sleep Percentage") + 
  ggtitle("Distribution of Deep Sleep Percentage by Smoking Status") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

### Hours Asleep

```{r smokingAsleep}
ggplot(sleep, aes(x = Smoking.status, y = Hours.asleep)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Smoking Status") + 
  ylab("Hours Asleep") + 
  ggtitle("Distribution of Hours Asleep by Smoking Status") +
  theme(plot.title = element_text(hjust = 0.5)) -> p
 
ggplotly(p) %>% 
  style(hoverlabel = label)
```

Column {data-width=450}
----------------------

### Analysis

**Analysis of Awakenings and Smoking Status**

When looking at the conditional distribution of Awakenings by Smoking Status, one thing becomes very clear. Primarily, both smokers and non-smokers share incredibly similar distributions of awakenings. Nonsmokers are marginally more likely to have 0 or 4 awakenings than smokers, and smokers are more likely to wake up 1-3 times in a given night than nonsmokers. However, the differences between the distribution smoker and nonsmoker awakenings are so marginal that they are not statistically significant. As a result, we can say that smoking doesn't seem to have a strong influence on whether someone wakes up more often or not. 

**Analysis of Deep Sleep Percentage and Smoking Status**

There are some key differences between the distribution of deep sleep percentage between smokers and nonsmokers based off of the box plots. People who do not smoke have a slightly higher median deep sleep percentage than smokers. Additionally, the nonsmoking group has the highest overall observation of deep sleep percentage. The distribution of smoker deep sleep percentage is heavily left skewed with very high spread, while the distribution of nonsmoker deep sleep percentage is slightly right skewed, with a much tighter spread. From this, we can conclude that, while the median percentage of deep sleep is comparable between the two categories, nonsmokers typically and more consistently have a higher deep sleep percentage than smokers do. As a result, we can say that there seems to be a relationship between smoking and poor sleep quality. 

**Analysis of Total Hours Asleep and Smoking Status**

Once again, there are some key differences between the distribution of hours asleep for smokers and nonsmokers. While both distributions are slightly left skewed, it is important to note that nonsmokers have a higher median amount sleep in a night when compared to smokers by about 1 hour. Additionally, the distribution of hours asleep for smokers has a much higher spread than nonsmokers. As a result, it is fair to say that nonsmokers typically get more sleep in a given night than smokers, and the amount of sleep that nonsmokers get is much more consistent than that which smokers get.  


Alcohol
===

Column {.tabset data-width=550}
-------------------------------

### Awakenings

```{r AlcoholAwakenings}

sleep$Alcohol.consumption <- factor(sleep$Alcohol.consumption, levels = c("0.0", "1.0", "2.0", "3.0", "4.0", "5.0"))

ggplot(sleep, aes(x = Alcohol.consumption, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Alcohol Consumption (oz)") +
  ylab("Percentage of Alcohol Consumption") + 
  ggtitle("Distribution of Awakenings by Alcohol Consumption") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") + 
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

### Deep Sleep Percentage

```{r AlcoholDeepSleep}
sleep$Alcohol.consumption <- factor(sleep$Alcohol.consumption, levels = c("0.0", "1.0", "2.0", "3.0", "4.0", "5.0"))

ggplot(sleep, aes(x = Alcohol.consumption, y = Deep.sleep.percentage)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Alcohol Consumption (oz)") + 
  ylab("Deep Sleep Percentage") + 
  ggtitle("Distribution of Deep Sleep Percentage by Alcohol Consumption") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)

```

### Hours Asleep

```{r AlcoholAsleep}
ggplot(sleep, aes(x = Alcohol.consumption, y = Hours.asleep)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Alcohol Consumption (oz)") + 
  ylab("Hours Asleep") + 
  ggtitle("Distribution of Hours Asleep by Alcohol Consumption") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

Column {data-width=450}
----------------------

### Analysis

**Analysis of Awakenings and Alcohol Consumption**

When looking at the conditional distribution of Awakenings by Alcohol Consumption, we can notice a few general trends. First, people who have 0 oz 24 hours before bedtime are much more likely to have 0-1 awakenings in a night than to wake up 2 or more times. People who only have 1 oz follow a similar trend, but they are more likely to wake up 1 time than people who have no alcohol. From there, people who drink 2 os or more are much more likely to wake up 2 or more times, and people who drink 4-5 oz of alcohol are much more likely than other groups to wake up 4 times in a given night. From this, we can see a general trend. The more alcohol someone drinks, the more likely they are to wake up in the night and, the more likely they are to wake up multiple times. 

**Analysis of Deep Sleep Percentage and Alcohol Consumption**
 
After examining the box plots that show the distribution of Deep Sleep Percentage by Alcohol Consumption, we can see a few crucial things. People who drink no alcohol have the second highest median percentage of deep sleep, and the distribution of deep sleep percentage for this group has the smallest spread. People who drink 1 oz of alcohol have the highest median percentage of all categories and the second smallest spread of all groups. However, the distribution for people who drink 1 oz is heavily left skewed. For people who drink 2 - 5 oz of alcohol in a day, the median value quickly drops off for people who only drink 0-1 oz of alcohol. Additionally, the distribution for people who drink 2 or more oz also has a very large spread. From this, we can see the trend that an individual's deep sleep percentage is greatly and negatively affected by drinking more than 1 oz of alcohol. Not only are they more likely to get less deep sleep in a given night, but they also are very inconsistent in the amount of deep sleep they get.  

**Analysis of Total Hours Asleep and Alcohol Consumption**




Caffeine
===

Column {.tabset data-width=550}
-------------------------------

### Awakenings

```{r CaffeineAwakenings}

sleep <- sleep %>% 
  filter(Caffeine.consumption != "100.0") ### Only one observation of 100 mg of Caffeine 

sleep$Caffeine.consumption <- factor(sleep$Caffeine.consumption, levels = c("0.0", "25.0", "50.0", "75.0", "100.0", "200.0"))

ggplot(sleep, aes(x = Caffeine.consumption, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Caffeine Consumption (mg)") +
  ylab("Percentage of Caffeine Consumption") + 
  ggtitle("Distribution of Awakenings by Caffeine Consumption") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") + 
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Deep Sleep Percentage 

```{r CaffeineDeepSleep}
ggplot(sleep, aes(x = Caffeine.consumption, y = Deep.sleep.percentage)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Caffeine Consumption (mg)") + 
  ylab("Deep Sleep Percentage") + 
  ggtitle("Distribution of Deep Sleep Percentage by Caffeine Consumption (mg)") +
  theme(plot.title = element_text(hjust = 0.5)) -> p 

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Hours Asleep

```{r CaffeineAsleep}
ggplot(sleep, aes(x = Caffeine.consumption, y = Hours.asleep)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Caffeine Consumption (mg)") + 
  ylab("Hours Asleep") + 
  ggtitle("Distribution of Hours Asleep by Caffeine Consumption (mg)") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```


Column {data-width=450}
----------------------

### Analysis

Analysis 

Biological Factors
===

Column {.tabset data-width=600}
----------------------------------

### Age Awakenings

```{r AgeAwakenings}

sleep <- sleep %>% 
  mutate(Age.group = case_when( 
    Age < 20 ~ "20-",
    Age >= 20 & Age <30 ~ "20-30",
    Age >= 30 & Age < 40 ~ "30-40",
    Age >= 40 & Age < 50 ~ "40-50",
    Age >= 50 & Age < 60 ~ "50-60",
    Age >= 60 & Age < 70 ~ "60-70"
  ))

ggplot(sleep, aes(x = Age.group, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Age Group") +
  ylab("Percentage of Age Group") + 
  ggtitle("Distribution of Awakenings by Age Group") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") + 
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Age Deep Sleep 

```{r AgeDeepSleep}
ggplot(sleep, aes(x = Age.group, y = Deep.sleep.percentage)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Age Group") + 
  ylab("Deep Sleep Percentage") + 
  ggtitle("Distribution of Deep Sleep Percentage by Age Group") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Age Hours Asleep 

```{r AgeAsleep}
ggplot(sleep, aes(x = Age.group, y = Hours.asleep)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Age Group") + 
  ylab("Hours Asleep") + 
  ggtitle("Distribution of Hours Asleep by Caffeine Consumption (mg)") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Gender Awakenings 

```{r GenderAwakenings}
ggplot(sleep, aes(x = Gender, fill = Awakenings)) + 
  geom_bar(position = "fill") + 
  xlab("Gender") +
  ylab("Percentage of Gender") + 
  ggtitle("Distribution of Awakenings by Gender") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_discrete(name="Awakenings") + 
  scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0")) + 
  scale_y_continuous(breaks = seq(0, 1, by = .2), labels = scales::percent) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Gender Deep Sleep 

```{r GenderDeepSleep}
ggplot(sleep, aes(x = Gender, y = Deep.sleep.percentage)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Gender") + 
  ylab("Deep Sleep Percentage") + 
  ggtitle("Distribution of Deep Sleep Percentage by Gender") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

### Gender Hours Asleep 

```{r GenderAsleep}
ggplot(sleep, aes(x = Gender, y = Hours.asleep)) + 
  geom_boxplot(fill = "turquoise") + 
  xlab("Gender") + 
  ylab("Hours Asleep") + 
  ggtitle("Distribution of Hours Asleep by Gender") +
  theme(plot.title = element_text(hjust = 0.5)) -> p

ggplotly(p) %>% 
  style(hoverlabel = label)
```

Column {.tabset data-width=400}
----------------------------------

### Age Analysis 

Analysis 1


### Gender Analysis

Analysis 2


Conclusions and Discussion
===

Column { data-width=500}
-------------------------

### Conclusion 

Summary 


Column { data-width=500}
-------------------------

### Limitations and Future Study

  One major limitation of this study is that sleep is influenced by a lot of different and complex components that can intertwine and interact with each other. Attributing sleep quality to the percentage of deep sleep and the number of awakenings is a very large simplification that excludes many other potential factors. 

  Another limitation of the study is that many of the variables, such as Caffeine Consumption and Alcohol Consumption, have a vary narrow range of given values. As a result, in this study I was forced to examine these variables as categorical variables, where in reality if I had more data I would have treated them as quantitative variables. This limited the analysis that I could perform in this study, and this can potentially hide relationships between variables.

  For future studies, I would include more variables that could potentially affect sleep, such as diet or screen time before bed in order to gain a more accurate sense of sleep quality factors. Additionally, the data that I would collect would be more in depth and have a greater breadth of values, so that in the future I can more easily treat these variables as quantitative data. 

### About the Author 

  I am Bryan Kohler, a current junior undergraduate student at the University of Dayton. I am pursuing a Bachelors Degree in Computer Science with a minor in Data Analytics, and I expect to graduate in May 2024.

https://www.linkedin.com/in/bryan-kohler/

### References 

**Data Source**

- https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency

**Sleep Background Information** 

- https://www.sleepfoundation.org/stages-of-sleep/deep-sleep